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INTRODUCTION 


Neural  networks  are  a  type  of  distributed  processing  system  [1]. 

They  consist  of  a  large  number  of  cells,  or  processing  nodes,  which  are 
massively  interconnected  together.  The  cells  receive  signals  from  each 
other,  from  other  networks  or  subnetworks,  and  from  the  external 
environment.  They  generate  further  signals  which  are  distributed  through 
the  networks  and  to  the  external  environment.  The  history  of  neural 
network  modelling  and  the  major  concepts  are  reviewed  in  references  [2] 
-  [8],  and  the  status  of  the  field  is  examined  in  [9].  The  two  major 
reasons  for  using  neural  networks  are;  first,  their  intrinsic  parallel 
processing  capability;  and  second,  the  associative  adaptation  obtained  by 
varying  the  strength  of  the  interconnection  endpoints  according  to  the 
current  and  past  activity  across  the  network.  The  parallel  capability 
opens  up  a  new  area  of  applications  to  previously  unsolvable  problems 
while  the  adaptation  permits  the  networks  to  modify  their  overall 
behavior  to  fit  the  requirements  of  their  environment. 

The  nodal  activity  and  the  strength  of  the  interconnections  are  the 
variable  parameters  treated  in  the  neural  models.  The  interconnection 


pattern  of  a  neural  network  is  usually  considered  fixed,  and  the  signal 
transmission  paths  are  passive  (an  example  of  an  exception  to  this  is  the 
Grossberg  masking  field  [10].)  The  pathways  introduce,  at  most,  some 
time  delays  and  attenuation,  both  of  which  are  neglected  in  many,  but  not 
all,  of  the  major  neural  models  [4]. 

The  output  signal  of  a  cell  is  a  positive  real  scalar  quantity.  It  is 
the  number  of  pulses  per  second,  averaged  over  some  nominal  time,  where 
the  pulses  are  the  spike-like  bursts  emanating  from  the  cell.  This  is  the 
biological  model.  No  significance  has  been  attached  to  the  spike  patterns, 
but  recent  research  [11,12]  indicates  that  this  assumption  is  not  always 
true.  Mathematically,  this  suggests  that  the  output  signal  may  possess 
phased  subsignals  over  some  basis  set,  and  that  future  treatments  should 
include  complex  amplitudes  to  account  for  phase,  and/or  semidigital 
outputs  to  account  for  superpositions  of  basis  functions  as  being  the  net 
signal. 

There  are  many  approaches  being  investigated.  Those  which  attempt 
to  follow  the  biological  models  use  simple  processor  rules  in  the  nodes: 
the  inputs  are  weighted,  summed  and  thresholded  to  produce  the  outputs, 
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and  the  connection  strengths  are  increased  on  active  input  channels  if  the 
receiving  node  is  also  active.  This  is  the  generic  additive  model  with  a 
Hebbian  learning  law  [131.  Variations  include  shunting  [14]  and  error- 
connection  schemes  [15,16].  Other  approaches  are  complex  programmable 
nodes  with  multiple  outputs  and  nonlocal  rules,  but  with-relatively  few 
nodes  and  simple  nearest-neighbor  interconnects  such  as  the  hypercube 
system  consisting  of  a  group  of  interconnected  microprocessors  [17]. 

Despite  these  numerous  variations,  there  is  a  major  commonality 
among  the  models  when  they  are  viewed  from  their  functional  properties 
and  actual  performance.  They  are  parallel,  recurrent,  adaptive  systems. 
There  is  no  single  unified  model.  Each  model  is  designed  to  handle  a 
particular  type  of  processing  problem  (Table  1).  A  major  research  issue 
is  simply  how  to  relate  the  capabilities  and  performance  of  various 
models  to  actual  problems  and  applications:  how,  for  example,  can  an 
optical  image  be  processed  so  that  the  objects  can  be  reliably  located  and 
identified,  or  more  generally,  how  can  invariant  features  be  extracted 
from  a  complex  signal  distribution? 


3 


TABLE  1.  Neural  network  models 


Model 

Characteristic  Use 

Adaptive  resonance  [8] 

Back  propagation  [15] 

Hypothesis  testing 

Supervised  learning 

BAM,  ABAM  [13] 

Stable  adaptation 

Crossbar/Hopf ield  [18] 

Optimization 

Kohonen  [8] 

Mapping 

Neocognition  [8] 

Recognition 

Boltzman/Cauchy  [8] 

Optimization 

Symbolic  substitution  [19] 

Digital/optical  logic 

Adaline  [16] 

Nulling 

Perceptron  [20] 

Nulling 

Avalanche  [4] 

Time  sequencing 

Shunting  [4] 

Competitive 

Masking  fields  [10] 

Groupings 

Counter  propagation  [8] 

Probability  mapping 

Higher-order  learning  units  [8] 


Invariant  filters 


The  functional  properties  of  current  neural  network  models  are 
expressed  mathematically  as  a  first-order  time  derivative  of  the  internal 
activity  of  the  ith  cell  set  equal  to  a  collection  of  terms.  Each  term 
carries  a  specific  interpretation,  and  the  terms  and  certain  combinations 
of  the  terms  endow  the  networks  with  their  functional  properties.  A 
neural  network  model,  in  general,  is  described  by: 

1.  A  statement  of  the  fixed  interconnect  matrix. 

2.  A  rule  or  set  of  rules  describing  the  nodal  activity’s  temporal 
behavior  and  its  inputs. 

3.  A  learning  law  describing  how  the  strength  of  an  interconnection 
point  changes  in  time. 

4.  The  thresholding  function. 

5.  A  stability  function. 

Since  the  basic  interconnect  matrix  is  fixed,  it  is  rarely  discussed  as  a 
separate  entity.  Instead,  it  is  combined  with  the  second  and  third 
variables  of  nodal  activity  and  interconnect  strengths,  respectively.  But 
for  nets  which  perform  fixed  logic,  [21]  for  example,  the  interconnect 
matrix  is  the  dominant  parameter,  because  in  such  systems  the 
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interconnect  strengths  are  fixed  and  the  nodal  signals  are  flexible.  We 
will  be  concerned  with  variable  system s-,  and  will  not  discuss  the 
interconnect  matrix  in  detail. 

II.  FUNCTIONAL  PROPERTIES 

The  internal  activity  of  the  itfl  celt  follows  the  general  dynamical 

form 


Udj 

— -  =  {Loss  Term}  +  {Excitation}  +  {Inhibitat ion}  +  {Adaptation}.  (1) 
dt 

(While  other  forms  are  possible,  this  is  by  far  the  most  accepted  version.) 

The  loss  term  describes  a  relaxation  mechanism  leading  to  the  steady 
state.  It  is  a  simple  exponential  decay: 

{Loss  Term}  =  -  A  aj.  (2) 

It  can  be  interpreted  as  attenuation,  absorption,  or  subtraction.  An 
alternate  interpretation  is  obtained  by  combining  the  loss  term  with  the 
time  derivative  on  the  LHS  of  Eq.  (1)  and  noting  that  to  first  order  they 
approximate  aj  evaluated  at  a  time  t  +  (1/A).  This  can  be  read  as  a 

statement  of  causality,  or  as  indicating  a  feedback  loop  with  a  finite  time 
delay  of  I /A. 
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The  next  term  is  responsible  for  providing  excitation.  It  can  be  fixed 
or  adaptive,  linear  or  nonlinear,  but  its  basic  feature  is  that  it  is  a 
positive  and  (usually)  increasing  function  of  the  inputs  at  a  given  cell.  A 
somewhat  similar  term,  but  with  a  negative  signal  provides  ah  inhibitory 
input.  A  variation  is  to  use  a  combined  product  term  where  the  numerator 
is  designated  as  the  excitation  and  the  denominator  is  the  inhibiting 
factor  which  serves  as  a  type  of  loss  as  well.  Both  excitating  and 
inhibiting  contributions  can  be  fixed  or  adaptive.  Often,  these 
contributions  are  in  the  form  of  an  input  signal  distribution  with  a 
matrix-vector  type  of  connection  to  the  cell.  The  adaptation  is 
universally  assumed  in  all  models  to  occur  by  providing  a  mechanism  for 
varying  the  strength  of  the  interconnections.  The  choice  of  variation, 
called  the  learning  law,  depends  on  the  particular  neural  network  model, 
since  these  terms  vary  in  form  in  different  model.  Specific  discussions 
are  given  in  the  next  section. 
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III.  MAJOR  MODELS 
A.  Model  Equations 

The  five  most  common  neural  models  are  the  generic  additive  model, 
the  shunting  models,  the  two-slab  bidirectional  associative  memory 
(BAM),  the  back  propagation,  and  the  Kohonen  model.  Their  nodal  equations 
are= 

1.  Generic  Additive  Model 

a j  =  -  A  a j  +  I j  Xmjj  SCaO  ^ 

j*l  J  J 

where  S(ap  is  the  thresholding  function  and  m  j  j  governs  the  learning 
law,  to  be  discussed  later.  lj  is  an  input  signal  from  the  external 
environment. 

2.  Shunting  Model  (Membrane  Equation) 

aj  =  -  A  aj  +  (B  -  aj)  Qj  -  (c  +  ai)  Pj  (4) 

where  Qj  is  the  excitatory  input;  Pj  is  the  inhibitory  input,  and  A,  B,  and 
C  are  constants. 
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3.  BAM  Model  (Two  Slabs) 


a j  =  -  A  aj  ♦  1  j  +  S  mij  S(b j)  ,  (5) 

all  j 

b  |  =  -  A  b j  +  J;  +  X  m j  j  S(a j)  .  (6) 

all  i 

where  Ij  and  Jj  are  the  input  signals. 

4.  Back  Propagation 


(nth  slab)  (7) 


5.  Kohonen  Model 


3|  =  X  "’ll  Ij  (8) 

all  j 

There  are  several  learning  laws  as  well.  Most  are  based  on  Hebb’s 
observation  of  pairwise  association  [22]. 

b.  Learning-La^s 

The  learning  laws  operate  on  a  much  slower  time  scale  than  the 
nodal  activity.  The  most  prevalent  law  is  an  outer  product  of  the  output 
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signal  of  the  ith  cell  and  its  jth  input  signal,  with  an  exponential  decay 
relaxation  term  [13].  However,  some  learning  laws  act  on  a  still  larger 
time  scale.  For  these,  each  increment  of  the  weight  is  the  steady-state 
outer  product,  and  these  increments  are  summed  over  a  training  cycle 
time  larger  than  the  relaxation  time.  This  is  used  in  the  back  propagation 
model,  and  in  the  Kohonen  model,  and  is  similar  to  the  covariance  product 
sums  used  in  adaptive  phased  array  radar. 

Another  important  variant  is  the  Grossberg  competitive  law  [8]  in 
which  the  relaxation  time  is  proportional  to  the  output  signal  strength  of 
the  receiving  cell. 

The  major  learning  laws  are: 


1.  Covariant 

m  j  j  =  S(aj)  S(a  j)  .  (9) 


2.  Hebbian 


Dm  ♦  S(a,)  S (a j) 


(10) 
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3.  Competitive 


m()  -  5(aj)  ["  mij  *  S(aj)] 

and  the  Oja  variation  [23]; 

mi  j  =  S (ap  [- 5(aj)  mjj  ♦  S(ap] 

4.  Backward  Propagation 

Error  signals  8^  for  n-th  path  are-* 
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(12) 


S(n)  =  S' (aCn)]  I  m(nH)  *<"♦’> 
1  V  1  >  an  j  ')  1  . 


(13)' 


Am(n)  .  ,(n),- 

Am . .  -  cx  5 ,  5 

'J  1 


(r) 


(14) 


,«>.»•  (,«>)[t,-s(,«)] 


(15) 


»  3S(u)  .  „ 

where  N  indicates  the  final  output  slab;  S  =  0tl -  ,  and  Tj  is  a 


3ji 


training  signal. 


5.  Kohonen 


Amj  j  =  o<  j  "  mjj)  Zj 


(16) 
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where  Zj  =  1  if  aj  =  max  {ai<},  and  is  zero  otherwise. 

C.  Thresholding  Function 

The  threshold  function  describes  the  formation  of  a  cell's  output 
signal  as  a  function  of  its  internal  activity.  Early  neural  models  used  a 
linear  response,  but  the  current  second-generation  models  are  a  nonlinear 
response.  This  simple  innovation  is  of  fundamental  importance  because  it 
removes  the  inadequacy  of  neural  networks  to  provide  essential 
nonlinear  learning  [24]. 

Threshold  functions; 

1.  Step  Function 


s(a  j) 


aj  >  0 
aj  s  0 


2.  "Offset" 


(17) 


S(ap  =  [a,f 


aj  ,  aj  >  0 

0  ,  af  s  0 


(18) 
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3.  Hebbian 


(i)  S (a j)  =  tanh  aj 

a2 

00  S  Cap  =  — !—  .  09) 

1  a2 

I 

Bias  terms  and  scale  factors  can  be  incorporated  in  the  threshold 
functions. 

D.  Stability 

Another  major  feature  is  the  stability  of  a  network  [25].  It  is 
believed  desirable  for  the  network  to  approach  a  single  optimum  state 
after  being  given  an  initial  input  distribution.  Various  global  quantities 
with  quadratic  minima  have  been  defined  [l],  [16]  and  used  with  success 
to  design  networks  that  will  produce  optimizations  in  a  stable  manner. 
These  Lyapunov  functions  [25],  "energy"  functions  [8],  and  error  functions 
[1]  are  of  great  theoretical  interest,  but  for  the  purpose  of  this  paper  the 
nodal  activity  equation  and  the  learning  law  are  adequate  guides. 

In  Equations  (1)  -  (19)  the  term  types  are  (excluding  third  rank  and 
higher  tensors  [30,  31,  32]) 
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linear 


S  (a  j)  :  Lhresholded 

5^  m j  j  S (a j)  :  matrix-vector  products 

j 

and  the  operations  include  multiplication,  division,  addition,  subtraction, 
differentiation,  and  integration. 

E.  Discussion 

Description  of  functions  performed  bu  the_mn(tels 
The  generic  additive  model  and  the  two-slab  BAM  both  use  the 
Hebbian  learning  law.  These  nets  associate  inputs  because  one  input 
distribution  is  encoded  as  the  interconnects  of  each  node  activated  by 
another  distribution.  Later  exposure  of  either  input  will  reactivate  the 
other  input.  If  part  of  an  input  is  missing,  it  will  be  filled  in  because  the 
recalled  input  will,  in  turn,  attempt  to  reactivate  all  of  the  nodes  in  the 
first  input.  Sequences  can  be  encoded  on  a  pairwise  basis  AB,  BC,  CD, 
etc.  and  superimposed  to  form  an  asymmetric  memory  matrix.  Temporal 
order  can  be  restored  either  by  adjusting  the  nodal  activity  time  constant 
or  by  using  a  Hebbian  learning  law  which  responds  to  the  covariance  of 
the  first  time  derivatives  of  the  nodal  output  signals  (differential 


Hebbian).  Sequences  can  be  recognized  by  using  an  avalanche  network;  a 
set  of  "grandmother"  cells  (cells  tuned  to  recognize  specific  patterns)  is  • 
arranged  so  that  each  cell's  output  excites  only  the  next  cell  in  the 
sequence.  All  the  cells  receive  the  time-varying  input.  If  the  input 
matches  the  desired  sequence,  the  corresponding  set  of  cells  will 
reinforce  each  other  in  turn,  and  produce  a  recognition  signal  at  the  end  of 
the  sequence.  Still  other  variations  are  possible.  The  back  propagation 
and  the  Kohonen  models  both  incorporate  an  adaptive  fan-out  of  the  input 
distribution.  The  Kohonen  system  self-organizes  so  that  each  cell 
responds  best  to  specific  sub-inputs  which  are  closely  grouped  in  the 
feature  space,  and  thus  this  model  yields  good  statistical  approximations 
to  the  overall  input  distribution.  The  back  propagation  model  also 
receives  a  training  input.  It  forms  an  error  signal  as  the  difference 
between  the  actual  final  output  and  the  training  signal  of  those  nodes. 

This  error  is  propagated  back  through  the  interconnect  system  to  form 
new  error  signals  at  every  node.  The  weights  are  incremented  in 
proportion  to  the  covariance  of  the  errors  and  the  input  at  each  cell.  It  is 
a  remapping  network  with  good  statistical  invariance  to  input  signals. 
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The  shunting  networks  use  a  variety  of  learning  laws.  They  are  very 
powerful,  general-purpose  nets  which  effectively  deal  with  random  and 
patterned  noise,  and  also  automatically  renormalize  and  enhance  their 
activity  prior  to  the  slower  adaptation  processes  performing  the  adaptive 
encoding. 

IV.  OPTICAL  IMPLEMENTATION 

In  this  section  we  explain  how  some  of  the  system  equations  can  be 
described  in  an  optical  system.  There  are  two  major  problems  intrinsic 
to  the  project  that  cannot  be  easily  resolved.  The  first  concerns  the 
identification  of  the  activity  a;  with  optical  quantity,  which  can  be 
either  amplitude  or  intensity,  a\  ,  being  positive  in  the  neural  network 
model,  should  be  interpreted  as  an  intensity  which,  however,  does  not 
appear  in  optics  without  manipulation.  In  other  words,  the  amplitude  in 
optics  must  be  replaced  by  [Intensity]1 /2  and  this  may  or  may  not  be 
justified.  If  a;  is  considered  as  an  amplitude,  a  complex  quantity,  then 

the  phase  factor  does  not  admit  any  interpretation  in  neural  network 
models.  Since  this  problem  cannot  be  resolved  easily  and  the 
generalization  of  aj  as  a  complex  quantity  cannot  be  done  at  the  present 
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time,  we  will  use  the  slowly  varying  part  of  the  amplitude  in  optics  as 
the  desired  positive  number. 

Another  problem  concerns  the  relaxation  time  which  must  be  smaller 
than  nanoseconds,  whereas  in  neural  networks  the  time  scale  is  on  the 
order  of  milliseconds.  This  difference  makes  the  two  systems,  neural 
nets  and  optical  wave  equations,  never  physically  equivalent.  Again, 
we  can  only  keep  the  problem  in  perspective. 

A.  Thresholding 

The  Sigmond  function  S(aj)  is  the  output  for  a  given  input  a;  in 
the  cell.  This  can  be  accomplished  in  several  different  ways.  One  method 
is  to  take  advantage  of  the  amplified  medium.  Four-wave  mixing  [26],  or 
a  laser  amplifier  [27]  can  both  achieve  this  goal.  However,  all  of  these 
methods  transfer  the  energy  from  external  beams  to  the  amplified  beam. 
The  disadvantage  of  setting  up  the  external  sources  overwhelm  whatever 
advantage  that  can  possibly  be  gained. 

The  right  choice  of  nonlinear  material  can  provide  a  bistable 
characteristic  curve  in  Fig.  1,  where  Ejn  and  E0ut  are  the  incoming  and 
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outgoing  amplitudes,  respectively.  This  optical  bistable  device  is  easy  to 
set  up  and  a  theory  can  be  summarized  in  one  equation  (20),  [28]. 


r2  eo<-  -  .  r2  e-«-  *  l« 


2 


where 


(20) 


r„  =  ( i  ♦  J7)  ft  ±  ( i  - 77)  (20 

and  r  is  the  reflectivity  at  the  medium.  The  nonlinear  part  of  the 
dielectric  constant  e  enters  through 


I 


CX 


(22) 


where  L/X  is  the  ratio  of  the  cavity  length  and  wavelength. 

Although  Eq.  (20)  is  an  approximate  solution,  the  accuracy  is  good 
enough  for  our  purpose.  The  next  question  concerns  the  material.  This 
subject  has  been  reviewed  elsewhere  [29].  Many  are  available  depending 
on  the  requirement  on  the  thresholding  or  the  switching  time.  If  no 
strenuous  conditions  are  imposed,  then  any  material  with  nonlinearity 
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will  suffice.  In  the  definition  of  dielectric  constant  e  =  600  + 
and 

X  =  X  |  +  X2E  +  X3  E  •  E  +  ...  .  (23) 

The  case  where  X3  *  0  and  X2  =  0  defines  cubic  nonlinear  terms. 
When  X2  is  present,  the  effect  of  the  quadratic  nonlinearity  frequency 
doubling,  etc.  will  dominate  the  desired  bistable  effect. 

We  should  point  out  that  the  nonlinear  optical  material  considered 
here  and  the  photorefractive  material  used  later  can  both  perform  the 
thresholding  of  this  section,  and  the  association  memory  of  the  next 
section  .  The  reason  for  adoption  of  the  usual  nonlinear  bistable  material 
described  in  Eq.  (20),  for  example,  GaAs,  CdS,  etc.  for  thresholding  while 
associative  memory  is  considered  along  with  the  photorefractive  material 
like  BaTiC>3,  etc.  is  a  matter  of  convenience  for  practical  purposes.  For 

example,  it  has  been  reported  [30]  that  GaAs  can  perform  the  four-wave 
mixing  just  as  well  as  the  photorefractive  material,  with  vastly  improved 
speed  at  the  expense  of  a  much  larger  required  intensity.  In  this  report 
we  follow  the  conventional  application  in  the  literature. 
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B.  Adaptive  Term  and  Photorefractive  Medium 

The  index  of  refraction  of  a  photorefractive  material  can  be 
written  as 


+ 


+ 


+ 


i<Pa  (  m  *1 

na  e  I  A|  a4  +  a2A3 I  exP  'ka  ‘  r  +  c-  c- 

i<Pt>  I  * 

nb  9  \\  a3  +  A2A4/  exp  ikb  ‘  r  +  c.  c- 

i<pc  * 

n  e  A ,  exp  ik<~  •  r  +  c.  c. 

c  1  2  c 

i<P<j  » 

n,  e  A_  A,  exp  ikn  •  r  +  c.  c.  (24) 

d  3  4 


where  na  ...  n<j  etc.  are  the  optical  nonlinearities  and  <pa  •••  ‘Pd  are 
constants.  Equation  (24)  is  simply  the  index  of  refraction  according  to 
the  cubic  nonlinearity  with  the  specific  conditions  due  to  the  arrangement 


k 


1 


(25) 


(26) 
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When  n  is  substituted  into  the  Maxwell  equation  and  each  component  of 
exp  ikjx  (i  =  l  ,2,3,4)  is  identified,  we  obtain  [26] 

(az  *  c”ft)A|  =  Q|A4  "  Q2A3  "  °3A2  (27a^ 

(ic  iK  =  q>3  •  QA  •  Vi  (27b) 

=  -  O.  A2  -  QjA,  -  Q4A4  (270 

(i  *  it)A4  =  *  <A,  *  Q2A2  -  Vs  (27d) 

where  Qj  (i=l,2,3,4)  obey  the  Debye  equations 

*,Q,  *  Q,  1  TT  (Ai\  *  Vj)  (28a) 

*2°2  *  °2  =  77  (A!A3  *  V<)  (28b) 
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(28c) 

(28d) 


and  I0  =  S  Ij  =  I  |  Aj|  2  . 

Notice  that  the  Maxwell  equations  imply  Q  ,  =  0.  In  Eq.  (28)  the 
Qj  factor  is  often  introduced,  as  is  done  here,  to  reflect  the  buildup  time 

of  a  hologram  from  various  beams  in  the  medium.  Equations  (27)  -  (28) 
look  very  similar  to  the  BAM  model  but  with  some  important  differences. 
First  is  the  use  of  complex  numbers  introduced  throughout  the  formalism. 
The  phase  factor  is  crucial  in  optics  although  the  corresponding  network 
activity  aj  are  supposed  to  be  real.  Another  factor  is  the  difference 
between  aj  and  S(aj)  that  must  be  addressed  by  means  of  thresholding. 
In  view  of  these  difficulties,  we  cannot  take  this  set  of  equations  and  try 
to  identify  them  as  a  part  of  BAM.  The  photorefractive  interference,  as 
discussed  however,  will  be  used  as  a  "component”  of  the  BAM  to  function 
or  perform  as  the  adaptive  term.  This  is  to  be  illustrated  in  the  next 


section. 


We  follow  here  one  example  set  up  by  Yariv  et  al.  [31]  The  pump 
beam  in  Fig.  2  is  to  be  identified  as  S(aj)  and  5*(bj).  Then  the 

nonlinear  part  of  the  index  of  refraction  is 

An  =  ^  S*(aj)  S(bj)  •  A j j  (29) 

'j 

where 

A. . 

|J 


+  i 


F.-y 


(30) 


and  An  is  stored  in  a  hologram.  When  E’  in  the  direction  of  5(a) 
shines  on  the  hologram,  then  a  diffraction  beam  Ediff 


•diff 


=  J  e 


-  kb  •  r 


(31) 


is  produced,  and 


J  =  J  eV)  S*(aj)  S(bj)  d3r'  exp{ik(xx'  +  yy')/r} 
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S*(a) 
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is  the  overlapping  integral.  The  reflected  field  moves  opposite  to  S(bj) 
«  S*(a)  S(b),  and  this  reflected  beam  excites  An  again  to  produce  a 
field  proportional  to 


An  Eout  =  J  S(bj)  ^  S*(a) 


(32) 


The  net  result,  if  all  contributions  are  considered,  can  be  expressed  as 


CAE),  =  S  mij  s<bj) 
j 


(33) 


and 

r  m  jj  +  m  j  j  =  S*(a  )  S(b)  .  (34) 

In  actuality,  mjj  ~  S*(a)  S(b),  but  the  time  derivative  in  Eq.  (34)  is 
added  to  reflect  the  time  delay  between  5*(a)  and  S(b)  turn-on  and  the 
production  of  (AE)j.  Equations  (33)  and  (34)  for  (AE)j  show  how  the 

adaptive  term  can  be  produced  with  the  output  proportional  to  J  ,  the 
overlapping  integral.  This  scheme  has  the  advantage  for  application 
of  training  and  learning,  i.e.,  the  stored  information  An  is  retrieved  by 
the  incident  wave  Ej. 
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For  our  purpose  of  demonstration  of  the  adaptive  term  in  BAM,  a 
simpler  setup  can  be  given  as  follows.  The  holograph  is  set  up  as 

An  =X  S*(aj)  S(bj)  (35) 

P  >  P 

by  two  pump  beams.  A  probe  beam  S*(bj)  is  then  refracted  from 
the  diffraction  grating  to  produce  the  conjugate  beam  (AS(aj))c. 

Or,  in  our  notation, 


(As(ai))  =  S  rtijj  S”(b,) 
c 

and 

r  (mij)  +  m | j  =  S** (a j)  S(bj) 


(36) 


(37) 


Again,  the  time  derivative  is  added  for  consideration  of  buildup  time  for 
the  diffracting. 

A  short  summary  of  discussion  is  given  here.  We  demonstrate  how 
an  optical  beam  called  (AS(a))c  can  be  produced  for  a  given  set  of 
(a j,b j)  as  shown  in  Fig.  3,  when  bj  -»  S(bj)  ,  aj  -»  S(aj)  by  the 
thresholding  processes.  (S(a j))c  then  is  numerically  equal  to  Zj  m  j  j  $(bj) 
with  m  j  j  defined  in  Eq.  (37). 
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A  major  point  to  be  made  is  that  currently  the  only  optical  effect 
which  has  been  found  useful  for  implementing  associative  memories  is  the 
photorefractive  effect.  There  is  no  other  way  to  do  it  without  going  to 
hybrid  systems. 

C.  Shunting  Mechanism. 

The  shunting  term  is  of  the  form  given  by 

a,  X  S(aj) 

] 

which  is  proportional  to  the  product  of  the  two  amplitudes  a;  and  S(ap 

in  contrast  to  the  triple  product  for  the  adaptive  case.  It  seems  one 
should  be  able  to  use  the  X2  term  to  generate  this  term.  This  cannot  be 

carried  out  because  this  nonlinear  term  has  its  dominant  effect  on  the 
phase  in  the  wave  propagation  and  not  on  the  absolute  magnitude  of  the 
amplitude. 

We  have  examined  the  technique  of  optical  correlation  in  the 
literature  and  found  that  our  need  is  much  simpler,  since  no  spatial 
information  is  contained  in  aj.  To  accomplish  the  shunting  term  we  use 

the  geometry  in  Figure  4  where  a  photorefractive  crystal  is  present,  and 
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the  two  pump  beams  a\  and  £j  S(ap  form  a  grating  according  to  the 

(38) 

will  be  scattered  from  An, 
Ac  : 

(39) 

We  note  that  this  term  is  just  the  shunting  amplitude  with  specific 
direction  (  k  j  -  k  j  -  kc). 

d.  Amplification 

When  the  optical  system  is  designed,  absorption  in  the  system, 
intrinsic  or  external,  cannot  be  avoided.  At  certain  stages  of  operation 
the  optical  amplitude  must  be  amplified  to  sustain  the  operation.  Four- 
wave  mixing  in  a  nonlinear  medium  can  accomplish  this  goal. 

E)  and  E2  in  the  diagram  of  Fig.  5  are  the  pump  beams.  E4  is  the 
conjugated  beam  at  z  =  Li  and  E3  is  the  probe  beam,  which  satisfy 


shunting  term 


An  =  a.  S(a.)  e 

j 1  1 


ikv  r 


Then  a  beam  of  constant  amplitude  c  e  c 
with  the  resultant  amplitude  proportional  to 


A  «  c 
c 


ifk  -  k.  -  k  )  •  p 
I,S(a.)e1'  J  °J 


J 


J 
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FIGURE  5.  NONLINEAR  BEAM  AMPLIFICATION  GEOMETRY. 


where  7S  is  the  coupling  constant  and  Ak  =  |  k  i  +  k  2  -  k  3  -  k  4 1  is  the 
phase  mismatching,  which  in  general  is  zero.  Consequently,  E4/E3  can 

take  almost  any  value  when  Akl  «  1. 

In  summary,  we  have  demonstrated  the  optical  elements  performing 
the  following  tasks  for  a  given  set  of  (aj ,  Dj): 

(1)  Thresholding  to  convert 
aj  -»  S(aj)  ,  bj  S(bj) 

(2)  Production  of  neural  adaptive  changes 
A  =  Z  mj j  S(bj) 

where 

m  j  j  +  r  m  i  j  =  S  (a  j)  S*(bj)  . 
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(3)  Performing  the  shunting  as  dictated  by 

a  Ss(a) 

j  ‘ 

(4)  Producing  the  amplified  signal,  which  is  accomplished  by 
integrating  all  elements  as  described  in  this  section. 

V.  CANDIDATE  IMPLEMENTATIONS  OF  THE  NEURAL  NETWORKS 

A.  Conceptual^Architecture-S  for  Optical  Neural  Networks 

The  objective  of  this  section  is  to  devise  candidate 
implementations  of  the  neural  networks  discussed  in  the  first  section 
using  the  optical  effects  discussed  in  the  second  section.  Where  possible, 
all-optical  architectures  are  chosen  that  do  not  have  detector  arrays  and 
electronically  converted  video  data.  However,  some  hybrid  techniques 
have  been  used  where  they  appeared  to  be  the  only  method  available. 

b.  Building  .Blocks 

The  photorefractive  effect  has  been  used  as  the  preferred  method 
for  associative  memory.  The  usual  approach  of  storing  a  hologram  made 
from  the  mutual  interference  of  two  input  images  suffers  the  drawbacks 
of  low  output  during  recall.  This  is  because  the  output  is  a  diffracted 
reconstruction.  An  elegant  solution  which  provides  full-strength 

reconstruction  has  been  shown  by  Stoll  and  Lee,  and  their  technique 
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will  be  used  here.  Basically,  their  system  consists  of  a 
cascaded  pair  of  matched  filter  correlators  whcih  encode  given 
pairs  of  images  by  multiplying  them  with  a  different  reference 
angle  for  each  pair.  Thus  a  given  reference  beam  exits 
between  the  matched  filters,  and  it  is  amplified  by  a  nonlinear 
crystal  prior  to  its  reading  of  the  second  matched  filter. 

Their  system  has  a  fundamental  difficulty  that  prevents  its  use  as  a 
fully  adaptive  optical  associative  system*-  The  associative  encoding 
process  and  the  ensuing  readout  process  are  performed  separately.  Thus 
the  encoding  activity  does  not  account  for  the  modification  of  the  output 
signals  as  the  associative  matrix  is  formed.  This  is  a  problem  found  with 
almost  all  adaptive  associative  systems  using  the  photorefractive  effect. 
It  can  be  resolved  by  use  of  polarization-switching  dynamic  volume 
holograms  as  first  shown  by  Psaltis  [81.  By  combining  the  Psaltis 
technique  with  Stoll  and  Lee’s  system,  an  adaptive  optical  associative 
architecture  can  be  devised  and  is  shown  in  Figure  6.  Its  operation  is  as 
follows:  Two  photorefractive  crystals  A  with  a  gain  crystal  B  are 
arranged  in  the  Stoll  and  Lee  configuration  with  a  reference  beam  © 
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FIGURE  6.  ASSOCIATIVE  OPTICS 


being  provided  to  each  crystal  A  .  The  input  distributions  Sa  and  Sb 
to  be  associated  pass  through  nonreciprocal  Faraday  rotaters  C  as  in  the 
Psaltis  system.  ©  has  a  polarization  at  90°.  Sa  and  Sb  are  placed  at 
-45°.  The  rotaters  C  produce  a  +45°  rotation  to  Sa  and  Sb  so  that 
when  they  enter  the  photorefractive  crystals  A  they  are  also  polarized 
at  90°.  They  interfere  with  the  ©-beam  to  produce  the  desired  volume 
holograms  in  the  crystals  A  .  As  the  holograms  form,  Sa  and  Sb  are 
diffracted  from  them.  The  diffracted  beams,  due  to  the  large  angles 
between  the  Sa  ,  Sb  and  9  beams,  are  polarization-switched  to  the 
orthogonal  0°  polarization  state  as  in  the  Psaltis  arrangement  of 
Reference  8.  These  diffracted  beams  pass  through  the  gain  crystal  B  and 
are  incident  upon  the  photorefractive  crystals  again,  where  they  contact 
with  the  volume  holograms  and  are  again  diffracted  to  form  the  output 
beams  Kba  and  KaB  •  These  beams  also  undergo  polarization-switching 
from  0°  to  90°.  They  pass  through  the  rotations  C  and  emerge  at  a 
polarization  angle  of  +45°,  orthogonal  to  the  Sa  and  Sb  beams.  They 
can  then  be  separated  by  a  polarizing  beamsplitter.  They  are  the  adaptive 
inputs  to  the  slabs  generating  the  signals  Sa  and  Sb  .  Thus  the 
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adaptive  contributions  are  available  during  the  encoding  process.  After 
the  system  reaches  equilibrium,  new  pairs  of  slab  inputs  (not  shown)  can 
be  presented  with  the  angular  reference  beam  e  reset  to  a  new  angle. 

The  remaining  building  blocks  are  less  complex.  Figure  7  shows  the 
threshold  crystal.  It  can  be  a  bistable  device,  or  a  pumped  BaTi03 
operating  in' the  saturation  regime.  Figure  8  shows  how  an  optical 
feedback  loop  provides  the  necessary  time  delay  factor  to  generate  the 
loss  term  in  the  nodal  activity.  Shunting  can  be  achieved  by  the  technique 
discussed  earlier,  or  a  hybrid  implementation  used  in  which  a  SLM  is 
inserted  into  the  time  delay  loop  to  vary  the  splitting  coefficient  k  in 
proportion  to  the  local  average  of  the  slab  activity. 

With  these  building  blocks,  the  following  optical  architectures  can  be 
schematically  developed  for  the  additive,  BAM,  and  backpropagation  nets. 
Inspection  of  the  defining  equations  for  the  BAM  and  additive  models 
shows  that  they  are  basically  equivalent  if  we  set  I  =  J.  Accordingly, 
only  the  BAM  architecture  is  discussed  here.  It  is  shown  in  Figure  9.  It 
consists  of  the  adaptive  building  block  with  provisions  for  adding  the 
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.  THRESHOLD  OPTICS 


FIGURE  9.  BAM  MODEL  (FOR  ADDITIVE,  I  =  J) 


input  distributions  to  the  adaptive  terms  and  thresholding  the  run.  This 


is  done  in  the  loops  shown  below  the  adaptive  optics. 

The  backpropagation  architecture  is  more  complex  and  requires 
additional  explanation.  It  is  shown  in  Figure  10.  Two  adaptive  systems 
are  arranged  in  a  square.  Two  additional  photorefractive  crystals  A  and 
two  more  thresholding  crystals  B  are  used.  The  new  thresholding 
crystals  are  operated  so  that  an  incident  beam  will  be  turned  off  rather 
than  on  at  the  threshold.  Their  threshold  level  is  higher  than  the  regular 
threshold  crystals  r  .  The  crystals  A  form  a  grating  proportional  to  the 
product  of  their  inputs.  Their  inputs  are  in  turn  diffracted  from  this 
grating.  The  diffracted  S'  beam,  containing  the  square  of  S’  rather  than 
S'  itself,  is  used  as  an  approximation  to  the  exact  form  from 
backpropagation  theory.  While  cumbersome,  this  architecture  satisfies  ali 
the  basic  requirements  of  an  optical  implementation  of  the  standard 
three-layer  feedforward  backpropagation  algorithm. 


42 


,  BACKPROPAGATION  OPTICS 
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